Introduction

What is Call of Duty/Warzone?

Call of Duty is a very popular video game series published by Activision. Recently, its free-to-play game Warzone has come into great popularity, specifically with the rise of Battle Royale type games. With this rise has also come a very large community that is very competitive. These players started to realize that the game's lack of a ranking system did not match up with their own matchmaking experience. Thus, players have started to wonder if there is a hidden skill-based matchmaking system present in the game.

Why would someone care about their matchmaking?

In general, all gamers care about their gaming experience. It's obviously not fun to consistently lose, but its also not fun to consistently win. Finding the balance, is very important to staying interested in a game for a long time. So, both from a consumer and developer standpoint, matchmaking is integral to keeping a video game relevant. While this is the case, not being able to see your performance in relation to matchmaking is removing a significant part of the experience from players. In other very popular video games such as League of Legends, Apex Legends, Valorant, and even FIFA all show their players their ranking and progression through the ranked tier system. In Warzone, such a system does not exist, and this leads players to question the skill levels of their opponents and themselves.

In addition, there is also the possibility that Activision has a financial incentive to modify matchmaking, especially at the content creator level. The reasoning behind this is that content creators hold great influence over potential customers, and giving them a good experience might lead to more customers.

What exactly is skill based matchmaking?

Skill based matchmaking is a system that matches players in a game based on some ranking. This ranking can be whatever metric the developers choose, but is often implemented as an ELO score or a custom MMR (matchmaking rating) score fitting the game's qualities.

What is the goal of our project?

We will be exploring two main questions for this project.

The first question will be "Is there skill-based matchmaking in Call of Duty: Warzone?". This question is a very common one amongst the COD user base and being members of this user base, we wanted to find out an answer.

The second question is arguably juicier because it puts Activision in the hot seat. We will be trying to answer "Does Activision purposefully lower the matchmaking difficulty of content creators?".

Due to the fact that the game has a very passionate community, learning the answers to these questions can be very insightful for the input fans want to give to developers. For the content creators, players who are also passionate viewers of COD on Twitch or YouTube might rethink their opinions of whoever they watch.

Data Collection

Due to the specific nature of this project, we had to find creative ways to collect data regarding Call of Duty matchmaking information. Luckily, there is a Call of Duty API that enables developers to look at past match data and stats for specific players. However, Call of Duty's official API enforces a setting where accounts must set their visibility to public for their profiles to be viewable. However, some third party APIs aggregate data across games and paint a clearer picture of players' statistics. One of these is WZStats.gg (Warzone Stats) and they show detailed per match data. By using their website and its API, we are able to get the data of many players that will help to inform our research question.

Because it is a manual process to set your profile to public visibility, it is likely that better-skilled players are going to be the ones with visible profiles. This potentially limits our visibility into the player skill spectrum. If there is skill-based matchmaking, the initial accounts, and their respective game history, that we are analyzing, will be biased towards higher tier skill levels because these players care a lot more about their stats than lower-skilled players and would be more likely to set their profiles to public. However, if there is no skill-based matchmaking, then the game lobbies will be entirely random as far as skill is concerned (there could be other factors such as network latency and global location). With random lobbies, we will hypothetically be able to tap into the entire spectrum of players if we analyze enough games.

As said, there is no existing database, so we needed to write code that could help create one. In order to do so, we first started by assembling a list of profiles that had public visibility. This included some of our own accounts and also those of pro players and content creators. As mentioned previously, looking at content creators' accounts could provide some insight into our second question, whether the matchmaking skill level of content creators' lobbies were lower.

The data collection process ended up being quite complicated for us. In fact, we spent 6 hours on this and had to try it about seven times. Ooops. So what went wrong? Our initial collection process was essentially built on the perspective of this problem through graph theory. Specifically, we wanted to preform a breath first traversal of a lot of accounts, to attempt to sample the player base as effective as possible. The plan was to start with the 10 seed accounts, as described earlier, and then treat each player as a new node, eliminating those who have been visited already, and continnue to analyze each persons' previous 20 matches. This analysis, more so data gathering, include capturing, all (up to) 150 players per lobby and the lifetime kds for each player in the lobby. Where was our logic faulty? Breadth first search only works under the assumption that there is no skill based match making. Specifically, if there is no skill based match making, then the breadth first search would allow us to branch away form the current lobby and to various different skill levels quite quickly, with few degrees of separation, if any at all. However, if there is skill based matching making, then we would be stuck in the same spectrum of the skill distribution and we wouled be unable to reach the rest of the player base unless our initial seed accounts were perfectly distributed across the spectrum of plauers, which they are not.

So, we move to method two. Method two entails a pivot away from the breadth first search and moves towards a depth first search, or rather a graph traversal with a split factor of 2. More specficially, in order to achieve a better sampling, starting at one player, we randomly sample 2 players from their most recent match lobby. If we can successfully sample 2 accounts with public data settings, then we add them to our queue, and then repeat the process that we did on the initial player. We aim to do this repeatedly to achieve 11 degrees of separation from the original account. We arbitirarily selected a professional content creator's account, NICKMERCS, as the initial account and let the program run overnight, sampling 2^11. The following diagram helps show the method in which we designed our system to collect data.

alt text

The following code implements "wzstats.py" which is the script we wrote that implements scraping data from wzstats.gg, the data aggregating site mentioned previously in this writeup. The following code is contained within wzstats.py:

The following is a snippet of each of our dataset that we built.

For our second question, we are going to want to look at the pro and content creator players games specifically. The following code will iterate through the top accounts and get their games.

Missing Data

From looking at the head of our dataset, we can immediately see that there are lots of NaN values. This might be alarming at first, but there is actually a very good reason for this. As we mentioned previously, not all accounts have public data available. Specifically, a player must go into their account settings to toggle this for every console linked to their Activision/Call Of Duty account. This means that people who care about their stats will probably go through the trouble of toggling this setting so that websites like Warzone Stats can display their history for them in an aggregated manner.

This then brings up the question of what type of missing data this is? Our initial hunch is that this data is Missing at Random, specifically that the missing username and platform is related to a player's lifetime_kd. We believe that this is a very strong potential reasoning because players who care about their statistics are probably players who also play the game a lot and thus might have a higher lifetime_kd. Lets check this theory out with some code!

Below we are going to preform a T-test (https://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm). See the link for some much more detailed documentation and notes on what T-test

From analyzing the average KDs of users with missing data and users without missing data, we can see that the averages are very different. But just how significant is the difference? We preformed a two-sided T-test on our two sets of data. We found the T score for the difference between the means of these two subsets was -135.158 and 53232.47 degrees of freedom. This corresponded to a p-value insignificantly different from zero. This is extremely strong evidence that there is a difference between the average KDs of players who have public and private data settings. In conclusion, the missing data is Missing at Random (MAR).

Exploratory Data Analysis (EDA) and Data Visualization

A basic level histogram doesnt actually show us much. We can see that most KDs are centered between 0 and 5, but there are definitely some significantly higher outliers. In practicality, this could mean that in the 2048 games we analyzed, we came upon players who are either insanely good, better than any professional ever, or we came upon players that are hacking. Having an incredibly high lifetime value is very difficult because of the randomness of games and even some of the best players will still have KDs around the 6-10 mark.

When we take a look at players whose KDs are over 6 we can still see a very high concentration between the 6 and 10 kd range. However, we then also see a second cluster of players at the 20+ KD mark. Because of the very low frequency of these players, we can definitely assume that these players are insignificant outliers. They actually end up changing a lobby's average KD by up to 35/150 = 0.23 which is a large amount of skew, but because we only see 15 of them across 2048 games, the overal difference is negligible.

We can see that there are 15 users with an over 10 lifetime kd. Of these 15 users, all 15 are actually private accounts. This is quite interesting because, we previously proved that players with higher KDs tend to turn their data settings to public. These players seem to be the best 15 by far, and yet their privacy settings are still set to private. While it is entirely possible that some extremely good players do not care enough to change the setting, it is unlikely that ALL 15 follow the same logic.

Moreover, in practicality, having a KD that high as a LIFETIME KD and not a SINGLE GAME KD is extremely unlikely. This would require players to drop 10+, 20+, or even 30+ kills per game consistently while limiting their deaths to 1 or 2. Note that for KD calculations, for the sake of avoiding divide by zero errors, COD counts 0 deaths as 1 death (i.e. 35 kills 0 deaths = 35 kills 1 death). Because even professional players and content creators are unable to achieve this level of success in their skill, we can reasonably assume that these 15 players are one of two things. They are either a brand new account with maybe 1 or 2 insanely good games, hence their lifetime kd and single game kds might be very similar, OR they are hackers who tend to get lots of kills over large amounts of games using hacks like aimbot and other cheats. Also note that I said brand new account and not brand new player. The reason for this is players can have numerous accounts, and it is possible that a professional player, content creator, or anyone for that matter, created a new account and had a very very good first game (or multiple), but the odds of the this happening are actually very low for another reason. In Call of Duty: Warzone, one way to improve your chances of winning is to level up your guns and achieve new attachments and other perks (https://www.dexerto.com/call-of-duty/best-warzone-loadouts-class-setup-1342383/). This can only be done by playing the game for an extensive amount of time, usually requirings 10s if not 100s of games to complete all necessary achievements to level up your equipment and profile. Inherently, this means that a very good player on a brand new account, still faces this challenge and is severely disadvantaged when entering into a game for the first time. Thus, from a practicality standpoint, it is more likely that these users are hackers or bots and not real, legitimate players.

Now that we have talked about the outliers, let's look back towards the more realistic end of the player spectrum.

I'm kind of curious about the frequency of certain KDs. I wonder if there are very frequent and also very infrequent KDs. We can see from the histogram that they are generally around 1, but I wonder if we can get the image when looking a bit more specific.

We can see that there are 2 major outliers here where ~250 people have the same KD, which is very unlikely to happen, especially at this scale. Lets take a look at what their KDs are.

We ended up looking for all KDs who had over 100 people with that KD. We end up seeing that very surprisingly, the KDs fall in line with {0, 1/3, 1/2, 2/3, 1} are the most common KDs, by far. In practicality, this is probably a sign of players who are playing their first games, or very limited games as these KDs are very common at a small scale number of games. Another possibility is some potential rounding on the API side of things for players with limited data.

Lets also do a little bit of analysis specifically on the lobbys of high skill playered games

Hypothesis Testing and Evaluation of Null Model

If we assume that COD Warzone does not matchmake lobbies on the basis of skill, or KD, we should expect that lobbies are a random sampling of individuals from the distribution of KD. We will analyze the correspondence of actual observations to this theoretical model to determine the fitness of this null model.

Given our null model, if the ~2000 actual observances of lobby average lifetime KDs fall under this model, they should be uniformly distributed over the percentiles (CDF at value observed) given by the model. To test this, we calculate the percentile of each observation by computing the theoretical distribution of each lobby given its size and then using this to find the CDF of that lobby size at the actual lobby average KD observed.

Again, if the null model is correct, and lobbies are a simple random sample of the population of players, we should see uniformly distributed percentiles according to this model over the sample of lobby average KDs.

These percentiles are not uniformly distributed so it is extremely unlikely that lobbies are generated in a way that randomly samples the population of players, i.e. ignores skill.

hypothesis testing for quesitons 2

We can see now at face value that the means and standard deviations are very different when we compare top players' games to the general population's games. Let's now conduct a T-test again to check if we can confidently say that there is enough difference between the two data sets to make a confident conclusion.

We can say with strong likelihood based off of these T-test results that the top player's games and general population's games are siginificantly different. We will address this more directly in the conclusion.

Conclusion

Answering questions 1 and 2, use results from hypothesis testing and above. Reference practicality and maybe find an article or two about whatever concept in COD u end up talking about.

Answer question 1 here.

In regards to question 2, we can reject the argument that pro players and content creators get easier lobbies. In the hypothesis testing, we showed that the average lifetime KD of a player in a top player's game is 6.87868171186066 standard deviations from the mean while the average lifetime KD of a player in any player's game is -1.092407307924713 standard deviations from the mean. In addition, the T distribution reflected a significant difference as well. Thus, it is clear that pro players actually have more difficult lobbies than the general population, thus disproving the hypothesis made that their lobbies can be favored.

Note: Coppens needs to write up stats stuff Heavy commenting of ALL code Explanations for each API method (can be brief, but we should include it still) Make sure all graphs and histos have plt.xlabel and plt.ylabel and plt.titles Prettify code, try to use variables instead of massive lines of code (like we did with len) Setup github pages make sure github pages works Once over review of ever cell, and markdown, check for spelling and punctuation etc. Make sure data aggregation code is commented out so that it doesnt run for 12 hours